class: center, middle, inverse, title-slide .title[ # Module 2: Data Wrangling ] .subtitle[ ## Introduction to Tools of the Trade in Data Analysis ] .author[ ### Dr. Christopher Kenaley ] .institute[ ### Boston College ] .date[ ### 2024/9/9 ] --- class: inverse, top # In class today ``` ## Warning: package 'kableExtra' was built under R version 4.2.3 ``` <!-- Add icon library --> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.14.0/css/all.min.css"> .pull-left[ Today we'll .... - Review/Learn about the pipe: `%>%` - Load some data - Perform some tidy operations - Peak under the hood of Module Project 2 ] .pull-right[ ![](http://www.alaskapublic.org/wp-content/uploads/2013/09/trans-alaska-pipeline-dnr.jpg) ] --- class: inverse, top <!-- slide 1 --> ## What is the pipe (`%>%`)? - comes from the `magrittr` package - loaded automatically with the super package `tidyverse` - makes code concise: * streamlining many operations into fewer LOC * reduces repetitive tasks ``` r iris <- group_by(iris,Species) summarise(iris,mean_length=mean(Sepal.Length)) ``` ``` ## # A tibble: 3 × 2 ## Species mean_length ## <fct> <dbl> ## 1 setosa 5.01 ## 2 versicolor 5.94 ## 3 virginica 6.59 ``` --- class: inverse, top <!-- slide 1 --> ## What is the pipe (`%>%`)? ``` r iris <- group_by(iris,Species) summarise(iris,mean_length=mean(Sepal.Length)) ``` ``` r iris%>% group_by(Species)%>% summarize(mean_length=mean(Sepal.Length)) ``` ``` ## # A tibble: 3 × 2 ## Species mean_length ## <fct> <dbl> ## 1 setosa 5.01 ## 2 versicolor 5.94 ## 3 virginica 6.59 ``` --- ## What is the pipe (`%>%`)? - more apparent when plotting (major piece of data science) ``` r iris <- group_by(iris,Species) iris_mean <- summarise(iris,mean_length=mean(Sepal.Length)) ggplot(data=iris_mean,aes(x=Species,y=mean_length))+geom_bar(stat="identity") ``` ``` r iris%>% group_by(Species)%>% summarize(mean_length=mean(Sepal.Length))%>% ggplot(aes(x=Species,y=mean_length))+geom_bar(stat="identity") ``` ![](3140_f24_9-9_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## Loading data - `readr` package has several handy functions. - `read_csv()` most handy ``` r d <- read_csv("https://bcorgbio.github.io/class/data/coyote.csv") head(d) ``` ``` ## # A tibble: 6 × 10 ## id species Region state County Town Locality Lat Long Year ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> ## 1 adk2706 Canis latrans northeast New Y… <NA> <NA> <NA> 43.8 -75.0 2007 ## 2 adk2798 Canis latrans northeast New Y… <NA> <NA> <NA> 43.9 -74.7 NA ## 3 adk2801 Canis latrans northeast New Y… <NA> <NA> <NA> 43.9 -74.7 NA ## 4 adk2833 Canis latrans northeast New Y… <NA> <NA> <NA> 43.9 -74.8 NA ## 5 adk2845 Canis latrans northeast New Y… <NA> <NA> <NA> 42.8 -73.8 NA ## 6 adk2853 Canis latrans northeast New Y… Herkm… Lost… Lost Cr… 43.8 -75.0 2002 ``` --- ## Loading data ``` r d%>% group_by(state)%>% dplyr::summarize(n=n())%>% ggplot(aes(x=state,y=n))+geom_bar(stat="identity")+coord_flip() ``` ![](3140_f24_9-9_files/figure-html/unnamed-chunk-8-1.png)<!-- -->